Multimodal Context for Natural Question and Response Generation

نویسندگان

  • Nasrin Mostafazadeh
  • Chris Brockett
  • Bill Dolan
  • Michel Galley
  • Jianfeng Gao
  • Georgios P. Spithourakis
  • Lucy Vanderwende
چکیده

The popularity of image sharing on social media reflects the important role visual context plays in everyday conversation. In this paper, we present a novel task, ImageGrounded Conversations (IGC), in which natural-sounding conversations are generated about shared photographic images. We investigate this task using training data derived from image-grounded conversations on social media and introduce a new dataset of crowd-sourced conversations for benchmarking progress. Experiments using deep neural network models trained on social media data show that the combination of visual and textual context can enhance the quality of generated conversational turns. In human evaluation, a gap between human performance and that of both neural and retrieval architectures suggests that IGC presents an interesting challenge for vision and language research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

The popularity of image sharing on social media and the engagement it creates between users reflect the important role that visual context plays in everyday conversations. We present a novel task, ImageGrounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image. To benchmark progress, we introduce a new multiplereference dataset of crowd-sourced, eve...

متن کامل

Evaluating Questions in Context

We present an evaluation methodology and a system for ranking questions within the context of a multimodal tutorial dialogue. Such a framework has applications for automatic question selection and generation in intelligent tutoring systems. To create this ranking system we manually author candidate questions for specific points in a dialogue and have raters assign scores to these questions. To ...

متن کامل

Answering Questions about Moving Objects in Surveillance Videos

Current question answering systems succeed in many respects regarding questions about textual documents. However, information exists in other media, which provides both opportunities and challenges for question answering. We present results in extending question answering capabilities to video footage captured in a surveillance setting. Our prototype system, called Spot, can answer questions ab...

متن کامل

Answering Questions About Moving Objects in Videos

Current question answering systems succeed in many respects regarding questions about textual documents. However, information exists in other media, which provides both opportunities and challenges for question answering. We describe our efforts in extending question answering capabilities to video data: our implemented prototype, Spot, can answer questions about moving objects in a surveillanc...

متن کامل

Plan-Based Integration of Natural Language and Graphics Generation

W. Wahlster, E. André, W. Finkler, H.-J. Profitlich and T. Rist, Plan-based integration of natural language and graphics generation, Artificial Intelligence 63 (1993) 387-427. Multimodal interfaces combining natural language and graphics take advantage of both the individual strength of each communication mode and the fact that several modes can be employed in parallel. The central claim of thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017